Upgrading from versions 1, 2 or 3.
Keyoti SearchUnit 2012 (and 2010) incorporates some significant changes in methodology and the degree of work required to upgrade will be proportional to the proximity of your projects to the older version API. Eg. for a simple case of indexing a web-site through the supplied tools and searching via the SearchResult control - there may be only a few changes to make. However in a case where extensive use of our API has been made, there will be several places where code needs to change (usually property name changes or change to a new method) and possibly attention paid to how configuration settings are stored.
The most fundamental change is the indexing format and methodology. New version indexes are not compatible with older indexes, and must be generated again - however they are also much faster to generate now, so index recreation shouldn't be too large a task.
Further, new indexes are 'built' as documents are added (eg. via importing). The entire 'Build' operation has been removed. This makes all operations more efficient and incremental indexing much quicker.
Step 1 - Creating the new index files
Porting Index Settings And Sources
If you have a v3 index directory that you wish to convert to new format, you can create a new folder for the new index, and copy all XML files from the v3 index directory to the new directory. Doing this will preserve configuration settings, location/content category setups, and imported sources (their import parameters, not the actual documents).
- If the index is critical, it's recommended that a backup of the original index files is kept until the new index is verified
Step 2 - Changing references
Change the project references from the old version DLLs to the version 4 DLLs.
Step 3 - Upgrading web.config
Check the web.config for any mention of our DLLs with specific version numbers. You should change the version number to that of the actual DLL versions. You can find the version of a DLL through Windows Explorer -> Properties -> Version (tab)
Step 4 - Upgrading classes
If you have used our API from your code, you will need to make some changes.
- As a result of the immediate indexing (see above), the DocumentImporter class has been merged with DocumentIndex. This means that any
calls you may have to the DocumentImporter should be moved to DocumentIndex.
Eg. the old routine of
DocumentImporter imp = new DocumentImporter(config);
imp.Open();
imp.Import(...);
imp.Close();
DocumentIndex idx = new DocumentIndex(config);
idx.Open();
idx.Build();
idx.Close();
can be changed to just
DocumentIndex idx = new DocumentIndex(config);
idx.Import(...);
idx.Close();
//Note: the Open() method is now called automatically in the DocumentIndex constructor.
- There are now convenience methods for common imports, eg. ImportWebsite, etc. which are easier to use than "Import(...)".
-
Optimization. The new index format achieves greater efficiency during indexing, in part by minimizing the amount of redundant optimization
performed. In v3 the index was always optimized for searching; whereas in new versions, the index should be optimized after batch operations have been
performed (eg. after imports, adds and deletes). Optimization usually takes seconds to minutes, depending on the index size.
The UI's have an Optimize button now, or to optimize programmatically, call DocumentIndex.Optimize(), eg.
DocumentIndex idx = new DocumentIndex(config);
idx.Import(...);
idx.Optimize();
idx.Close();
or
DocumentIndex idx = new DocumentIndex(config);
idx.Optimize();
idx.Close();
Optimization can improve search performance by a factor of ~5 and has no effect on future indexing performance.
- In order to obtain speed improvements we have removed the ability to search in a case sensitive manner. All searches are case insensitive.
- We have implemented 'stop words', which are common words that are not indexed. This results in speed improvements during indexing and searching.
The stop list can be modified by editing the file 'stoplist.txt' which is found in the index directory when the directory is first initialized (i.e.
before actually importing something). The contents of the file may be deleted if you wish to deactivate stop lists, or the contents may be
switched with a stop list for another language (which we will provide in the install dir). Also, calling Configuration.StopWords.Clear() immediately
after constructing a DocumentIndex or SearchAgent object will clear the stop list.
It is recommended that the following ASPX code is added to the HeaderTemplate and NoResultsTemplate (in the SearchResults control) to inform the user of which search words were ignored.
<%# Container.IgnoredWordsMessage %>
- You will probably find a few compilation errors, but please read them carefully as they should include information on the API change (i.e. new
method to use instead). Some changes made are just vanity, eg. name changes and many are just minor argument changes.
- If you use the Central Event System to handle ActionName.ResultItemsFinalized type Action events, then note that the Data property no longer
returns type ArrayList, but Keyoti.SearchEngine.Utils.ResultItemList.
eg.
if(e.ActionData.Name== ActionName.ResultItemsFinalized)
{
ArrayList results = e.ActionData.Data as ArrayList;
should be changed to
if(e.ActionData.Name== ActionName.ResultItemsFinalized)
{
ResultItemList results = e.ActionData.Data as ResultItemList;
- Configuration.CacheCrawlLinks: due to a change in the way documents are crawled and indexed (now simultaneously) it is recommended that
Configuration.CacheCrawlLinks is set false (default in new indexes), existing index configurations should be changed. Excessive memory
may be consumed otherwise.
Compilation
Typically the best way to see what code needs to change is to compile it with the new DLLs. There are numerous cases where a minor change has been made to a member name (to bring it inline with the .NET naming conventions) and cases where members have been made obsolete. In all cases the compiler error should include a suggestion of an alternative to use.
Pro Version Upgrades
In addition to everything described in this document, it's necessary to uninstall the Windows Service and Web Admin (using the administration MSI, located under the start menu) and then after installation of the main installer, install the new administration MSI.
General breaking changes
-
SearchBoxOptions, by default, will automatically create a location drop down menu and content category check boxes for the user. If you are
using this control you may prefer to set AutoGenerateLocationDropDown and AutoGenerateContentCheckBoxes to false, to make it revert to older, manual, behavior.
-
IDataAccess interface has been removed, please declare variables as XmlDataAccess instead. It is preferred that you work with DocumentIndex rather than XmlDataAccess when possible.